# Proximal Policy Optimization
Stable Vicuna 13b Delta
StableVicuna-13B is a fine-tuned version of the Vicuna-13B v0 model, enhanced through Reinforcement Learning from Human Feedback (RLHF) and Proximal Policy Optimization (PPO) on various dialogue and instruction datasets.
Large Language Model
Transformers English

S
CarperAI
31
455
Ppo LunarLander V2
This is a reinforcement learning model based on the PPO algorithm, specifically designed to solve the landing task in the LunarLander-v2 environment.
Physics Model
P
sb3
73
0
Featured Recommended AI Models